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27 Abstract 

28 

29 Background 

30 The identification of apparently conserved gene complements in the venom and salivary 

31 glands of a diverse set of reptiles led to the development of the Toxicofera hypothesis - the 

32 idea that there was a single, early evolution of the venom system in reptiles. However, this 

33 hypothesis is based largely on relatively small scale EST-based studies of only venom or 

34 salivary glands and toxic effects have been assigned to only some of these putative 

35 Toxcoferan toxins in some species. We set out to investigate the distribution of these putative 

36 venom toxin transcripts in order to investigate to what extent conservation of gene 

37 complements may reflect a bias in previous sampling efforts. 

38 Results 

39 We have carried out the first large-scale test of the Toxicofera hypothesis and found it 

40 lacking in a number of regards. Our quantitative transcriptomic analyses of venom and 

41 salivary glands and other body tissues in five species of reptile, together with the use of 

42 available RNA-Seq datasets for additional species shows that the majority of genes used to 

43 support the establishment and expansion of the Toxicofera are in fact expressed in multiple 

44 body tissues and most likely represent general maintenance or "housekeeping" genes. The 

45 apparent conservation of gene complements across the Toxicofera therefore reflects an 

46 artefact of incomplete tissue sampling. In other cases, the identification of a non-toxic 

47 paralog of a gene encoding a true venom toxin has led to confusion about the phylogenetic 

48 distribution of that venom component. 

49 Conclusions 

50 Venom has evolved multiple times in reptiles. In addition, the misunderstanding regarding 

51 what constitutes a toxic venom component, together with the misidentification of genes and 

52 the classification of identical or near-identical sequences as distinct genes has led to an 
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53 overestimation of the complexity of reptile venoms in general, and snake venom in particular, 

54 with implications for our understanding of (and development of treatments to counter) the 

55 molecules responsible for the physiological consequences of snakebite. 
56 

57 Keywords 

58 Snake venom, Toxicofera, Transcriptomics, 
59 

60 Background 

61 Snake venom is frequently cited as being highly complex or diverse [1-3] and a large number 

62 of venom toxin genes and gene families have been identified, predominantly from EST-based 

63 studies of gene expression during the re-synthesis of venom in the venom glands following 

64 manually-induced emptying ("milking") [4-8] and some proteomic studies of extracted 

65 venom. It has been suggested that many of these gene families have originated via the 

66 duplication of a gene encoding a non-venom protein expressed elsewhere in the body 

67 followed by recruitment into the venom gland where natural selection can act to increase 

68 toxicity, with subsequent additional duplications leading to a diversification within gene 

69 families, often in a species-specific manner [9-11]. However, since whole genome 

70 duplication is a rare event in reptiles [12], the hypothesis that novelty in venom originates via 

71 the duplication of a "body" gene with subsequent recruitment into the venom gland requires 

72 both that gene duplication is a frequent event in the germline of venomous snakes and that the 

73 promoter and enhancer sequences that regulate venom gland-specific expression are 

74 relatively simple and easy to evolve. It also suggests a high incidence of neofunctionalisation 

75 rather than the more common process of subfunctionalisation [13-16]. 

76 The apparent widespread distribution of genes known to encode venom toxins in snakes in 

77 the salivary glands of a diverse set of reptiles, including both those that had previously been 

78 suggested to have secondarily lost venom in favour of constriction or other predation 

3 



Downloaded from http://biorxiv.org/ on September 18, 2014 

79 techniques and those that had previously been considered to have never been venomous led to 

80 the development of the Toxicofera hypothesis - the single, early evolution of venom in 

81 reptiles [17-19] (Figure 1). Analysis of a wide range of reptiles, including charismatic 

82 megafauna such as the Komodo dragon, Varanus komodoensis [20], has shown that the basal 

83 Toxicoferan venom system comprises at least 16 genes, with additional gene families 

84 subsequently recruited in different lineages [18, 19, 21]. 

85 Although toxic effects have been putatively assigned to some Toxicoferan venom proteins in 

86 some species, the problem remains that their identification as venom components is based 

87 largely on their expression in the venom gland during venom synthesis and their apparent 

88 relatedness to other, known toxins in phylogenetic trees. It has long been known that all 

89 tissues express a basic set of "housekeeping" or maintenance genes [22] and it is therefore 

90 not surprising that similar genes might be found to be expressed in similar tissues in different 

91 species of reptiles and that these genes might group together in phylogenetic trees. However, 

92 the identification of transcripts encoding putative venom toxins in other body tissues would 

93 cast doubt on the classification of these Toxicoferan toxins as venom components, as it is 

94 unlikely that the same gene could fulfil toxic and non-toxic roles without evidence for 

95 alternative splicing to produce a toxic variant (as has been suggested for acetylcholinesterase 

96 in the banded krait, Bungarus fasciatus [11, 23]) or increased expression levels in the venom 

97 gland (where toxicity might be dosage dependent). In order to address some of these issues 

98 and to test the robustness of the Toxicofera hypothesis we have carried out a comparative 

99 transcriptomic survey of the venom or salivary glands, skin and cloacal scent glands of five 

100 species of reptile. Unlike the pancreas and other parts of the digestive system [24, 25], these 

101 latter tissues (which include a secretory glandular tissue (the scent gland) and a relatively 

1 02 inert, non-secretory tissue (skin)) have not previously been suggested to be the source of 

1 03 duplicated venom toxin genes and we would therefore only expect to find ubiquitous 

1 04 maintenance or "housekeeping" genes to be commonly expressed across these tissues. Study 
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105 species included the venomous painted saw-scaled viper (Echis coloratus); the non- 
106 venomous corn snake (Pantherophis guttatus) and rough green snake (Opheodrys aestivus) 

1 07 and a member of one of the more basal extant snake lineages, the royal python {Python 

108 regius). As members of the Toxicofera sensu Fry et al. [21] we would expect to find the basic 

109 Toxicoferan venom genes expressed in the venom or salivary glands of all of these species. In 

1 1 0 addition we generated corresponding data for the leopard gecko (Eublepharis macularius), a 

1 1 1 member of one of the most basal lineages of squamate reptiles that lies outside of the 

112 proposed Toxicofera clade (Figure 1). We have also taken advantage of available 

1 1 3 transcriptomes or RNA-Seq data for corn snake vomeronasal organ [26] and brain [27], garter 

1 1 4 snake (Thamnophis elegans) liver [28] and pooled tissues (brain, gonads, heart, kidney, liver, 

1 1 5 spleen and blood of males and females [29]), eastern diamondback rattlesnake (Crotalus 

1 1 6 adamanteus) and eastern coral snake (Micrurus fulvius) venom glands [7,8, 30], king cobra 

1 1 7 (Ophiophagus hannah) venom gland, accessory gland and pooled tissues (heart, lung, spleen, 

1 1 8 brain, testes, gall bladder, pancreas, small intestine, kidney, liver, eye, tongue and stomach) 

119 [31], Burmese python {Python molurus) pooled liver and heart [32], green anole (Anolis 

120 carolinensis) pooled tissue (liver, tongue, gallbladder, spleen, heart, kidney and lung), testis 

121 and ovary [33] and bearded dragon (Pogona vitticeps), Nile crocodile (Crocodylus niloticus) 

122 and chicken (Gallus gallus) brains [27], as well as whole genome sequences for the Burmese 

1 23 python and king cobra [3 1 , 34] . 

1 24 Assembled transcriptomes were searched for genes previously suggested to be venom toxins 

125 in Echis coloratus and related species [5, 35, 36] as well as those that have been used to 

126 support the Toxicofera hypothesis, namely acetylcholinesterase, AVIT peptide [9, 11, 18, 19, 

127 23], complement c3/cobra venom factor, epididymal secretory protein [19, 37], c-type lectins 

128 [38, 39], cysteine-rich secretory protein (crisp) [40, 41], crotamine [42, 43], cystatin [44, 45], 

129 dipeptidylpeptidase, lysosomal acid lipase, renin aspartate protease [5, 19, 35, 46], 

130 hyaluronidase [47, 48], kallikrein [49, 50], kunitz [51], l-amino-acid oxidase [52, 53], nerve 
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131 growth factor [54, 55], phospholipase A2 [15], phospholipase b [7, 56, 57], ribonuclease [58], 

132 serine protease [59, 60], snake venom metalloproteinase [61, 62], vascular endothelial 

133 growth factor (vegf) [9, 17, 63, 64], veficolin [65], vespryn, waprin [19, 66-68] and 3-finger 

134 toxiw [69]. 

1 35 We find that many genes previously claimed to be venom toxins are in fact expressed in 

136 multiple tissues (Figure 2) and that transcripts encoding these genes show no evidence of 

1 37 consistently elevated expression level in venom or salivary glands compared to other tissues 

138 (Supplemental tables S5-S9). Only two putative venom toxin genes (l-amino acid oxidase b2 

139 and PLA2 IIA-c) showed evidence of a venom gland- specific splice variant across our 

140 multiple tissue data sets. We have also identified several cases of mistaken identity, where 

141 non-orthologous genes have been used to claim conserved, ancestral expression and instances 

142 of identical sequences being annotated as two distinct genes (see later sections). We propose 

143 that the putative ancestral Toxicoferan venom toxin genes do not encode toxic venom 

144 components in the majority of species and that the apparent venom gland- specificity of these 

145 genes is a side-effect of incomplete tissue sampling. Our analyses show that neither increased 

146 expression in the venom gland nor the production of venom- specific splice variants can be 

147 used to support continued claims for the toxicity of these genes. 
148 

149 Results 

150 Based on our quantitative analysis of their expression pattern across multiple species, we 

1 51 identify the following genes as unlikely to represent toxic venom components in the 

152 Toxicofera. The identification of these genes as non- venom is more parsimonious than 

1 53 alternative explanations such as the reverse recruitment of a "venom" gene back to a "body" 

154 gene [70], which requires a far greater number of steps (duplication, recruitment, selection for 

155 increased toxicity, reverse recruitment) to have occurred in each species, whereas a "body" 

1 56 protein remaining a "body" protein is a zero-step process regardless of the number of species 
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1 57 involved. The process of reverse recruitment must also be considered doubtful given the 

158 rarity of gene duplication in vertebrates (estimated to be between 1 gene per 100 to 1 gene 

159 per 1000 per million years [71-73]. 

160 Acetylcholinesterase 

161 We find identical acetylcholinesterase (ache) transcripts in the E. coloratus venom gland and 

162 scent gland (which we call transcript 1) and an additional splice variant expressed in skin and 

1 63 scent gland (transcript 2). Whilst the previously known splice variants in banded krait 

164 (Bungarus fasciatus) are differentiated by the inclusion of an alternative exon, analysis of the 

165 E. coloratus ache genomic sequence (accession number KF1 14031) reveals that the shorter 

1 66 transcript 2 instead comprises only the first exon of the ache gene, with a TAA stop codon 

167 that overlaps the 5' GT dinucleotide splice site in intron 1. ache transcript 1 is expressed at a 

1 68 low level in the venom gland (6.60 FPKM) and is found in multiple tissues in all study 

169 species (Figure 2), as well as corn snake vomeronasal organ and garter snake liver. The 

1 70 shorter transcript 2 is found most often in skin and scent glands (Figure 2, Supplementary 

171 figure SI). The low expression level and diverse tissue distribution of transcripts of this gene 

172 suggest that acetylcholinesterase does not represent a Toxicoferan venom toxin. It should 

1 73 also be noted that the most frequently cited sources for the generation of a toxic version of 

1 74 ache in banded krait via alternative splicing include statements that ache "does not appear to 

175 contribute to the toxicity of the venom" [74], is "not toxic to mice, even at very high doses" 

1 76 [75] and is "neither toxic by itself nor acting in a synergistic manner with the toxic 

177 components of venom" [76]. 

178 AVIT 

1 79 We find only a single transcript encoding an AVIT peptide in our dataset, in the scent gland 

1 80 of the rough green snake (data not shown). The absence of this gene in all of our venom and 

181 salivary gland datasets, as well as the venom glands of the king cobra, eastern coral snake and 

1 82 Eastern diamondback rattlesnake and the limited number of sequences available on Genbank 
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183 (one species of snake, Dendroaspis polylepis (accession number P25687) and two species of 

184 lizard, Varanus varius and Varanus komodoensis (accession numbers AAZ75583 and 

1 85 ABY89668 respectively)) despite extensive sampling, would suggest that it is unlikely to 

1 86 represent a conserved Toxicoferan venom toxin. 

187 Complement C3 ("cobra venom factor") 

188 We find identical transcripts encoding complement c3 in all tissues in all species, with the 

1 89 exception of royal python skin (Figures 2 and 3) and we find only a single complement c3 

1 90 gene in the E. coloratus genome (data not shown). These findings, together with the 

191 identification of transcripts encoding this gene in the liver, brain, vomeronasal organ and 

1 92 tissue pools of various other reptile species (Figure 3) demonstrate that this gene does not 

193 represent a Toxicoferan venom toxin. However, the grouping of additional complement c3 

194 genes in the king cobra (Ophiophagus hannah) and monocled cobra (Naja kaouthia) in our 

195 phylogenetic tree does support a duplication of this gene somewhere in the Elapid lineage. 

196 One of these paralogs may therefore represent a venom toxin in at least some of these more 

197 derived species and the slightly elevated expression level of this gene in the venom or 

1 98 salivary gland of some of our study species suggests that complement c3 has been exapted 

1 99 [77] to become a venom toxin in the Elapids. It seems likely that the identification of the non- 
200 toxic paralogue in other species (including veiled chameleon (Chamaeleo calyptratus), spiny- 

201 tailed lizard (Uromastyx aegyptia) and Mitchell's water monitor {Varanus mitchelli)) has 

202 contributed to confusion about the distribution of this "Cobra venom factor" (which should 

203 more rightly be called complement c3b), to the point where genes in alligator (Alligator 

204 sinensis), turtles (Pelodiscus sinensis) and birds (Columba livid) are now being annotated as 

205 venom factors (accession numbers XP 006023407-8, XP 0061 14685, XP 0055 13793, Figure 

206 3). 

207 Cystatin 
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208 We find two transcripts encoding cystatins expressed in the venom gland of E. coloratus 

209 corresponding to cystatin-e/m and/(Supplementary figures S2 and S3), cystatin-e/m was 

21 0 found to be expressed in all tissues from all species used in this study (Figure 2), as well as 

21 1 corn snake vomeronasal organ and brain and garter snake liver and pooled tissues. The 

21 2 transcript encoding cystatin-f (which has not previously been reported to be expressed in a 

213 snake venom gland) is also expressed in the scent gland of E. coloratus and in the majority of 

21 4 other tissues of our study species. We find no evidence for a monophyletic clade of 

21 5 Toxicoferan cystatin-derived venom toxins and would agree with Richards et al. [45] that low 

21 6 expression level and absence of in vitro toxicity represents a "strong case for snake venom 

21 7 cystatins as essential housekeeping or regulatory proteins, rather than specific prey-targeted 

21 8 toxins. . . " Indeed, it is unclear why cystatins should be considered to be conserved venom 

21 9 toxins, since even from its earliest discovery in the venom of the puff adder (Bitis arietans) 

220 there has been ". . .no evidence that it is connected to the toxicity of the venom" [44]. 

221 Dipeptidyl peptidases 

222 We find identical transcripts encoding dipeptidyl peptidase 3 and 4 in all tissues in all species 

223 except the leopard gecko (Figures 2, 4a and 4b), and both of these have a low transcript 

224 abundance in the venom gland of E. coloratus. dpp4 is expressed in garter snake liver and 

225 Anole testis and ovary and dpp3 is also expressed in garter snake liver, king cobra pooled 

226 tissues and Bearded dragon brain (Figures 4a and b). It is therefore unlikely the either dpp3 or 

227 dpp4 represent venom toxins. 

228 Epididymal secretory protein 

229 We find one transcript encoding epididymal secretory protein (ESP) expressed in the venom 

230 gland of Echis coloratus corresponding to type El . This transcript is also found to be 

231 expressed at similar levels in the scent gland and skin of this species and orthologous 

232 transcripts are expressed in all three tissues of all other species used in this study (Figure 2 

233 and Supplementary figure S4a), suggesting that this is a ubiquitously expressed gene and not 
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234 a venom component. Previously described epididymal secretory protein sequences from 

235 varanids [78] and the colubrid Cylindrophis ruffus [21] do not represent esp-el and their true 

236 orthology is currently unclear. However, our analysis of these and related sequences suggests 

237 that they are likely part of a reptile-specific expansion of esp-like genes and that the Varanus 

238 and Cylindrophis sequences do not encode the same gene (Supplementary figure S4b). 

239 Therefore there is not, nor was there ever, any evidence that epididymal secretory protein 

240 sequences represent venom components in the Toxicofera. 

241 Ficolin ("veficolin") 

242 We find one transcript encoding ficolin in the E. color atus venom gland and identical 

243 transcripts in both scent gland and skin (Figure 2, Supplementary figure S5) and orthologus 

244 transcripts in all corn snake and leopard gecko tissues, as well as rough green snake salivary 

245 and scent glands and royal python salivary gland. Paralogous genes expressed in multiple 

246 tissues were also found in corn snake and rough green snake (Supplementary figure S5). 

247 These findings, together with additional data from available transcriptomes of pooled garter 

248 snake body tissues and bearded dragon and chicken brains show that Ficolin does not 

249 represent a Toxicoferan venom component. 

250 Hyaluronidase 

251 Hyaluronidase has been suggested to be a "venom spreading factor" to aid the dispersion of 

252 venom toxins throughout the body of envenomed prey, and as such it does not represent a 

253 venom toxin itself [79]. We do however find two hyaluronidase genes expressed in the 

254 venom gland of E. color atus. The first appears to be venom gland specific (based on 

255 available data) and has two splice variants including a truncated variant similar to sequences 

256 previously characterised from Echis carinatus sochureki (accession number DQ840262) and 

257 Echis pyramidum leakeyi (accession number DQ840255) venom glands [48]. Although we 

258 cannot rule out hyaluronidase in playing an active (but non-toxic) role in Echis venom, it is 

259 worth commenting that hyaluronan has been suggested to have a role in wound healing and 
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260 the protection of the oral mucosa in human saliva [80]. The expression of hyaluronidases 

261 involved in hyaluronan metabolism in venom and/or salivary glands is therefore perhaps 

262 unsurprising. 

263 Kallikrein 

264 We find two Kallikrein-like sequences in E. coloratus, one of which is expressed in all three 

265 tissues in this species (at a low level in the venom gland) and a variety of other tissues in the 

266 other study species, and one of which is found only in scent gland and skin (Figure 2, 

267 Supplementary figure S6). These genes do not represent venom toxins in E. coloratus and 

268 appear to be most closely-related to a group of mammalian Kallikrein (KLK) genes 

269 containing KLK1, 11, 14 and 15 and probably represent the outgroup to a mammalian- 

270 specific expansion of this gene family. The orthology of previously published Toxicoferan 

271 Kallikrein genes is currently unclear and the majority of these sequences can be found in our 

272 serine protease tree (see later section and Supplementary figure SI 9). 

273 Kunitz 

274 We find a number of transcripts encoding Kunitz-type protease inhibitors in our tissue data, 

275 with the majority of these encoding kunitzl and kunitz2 genes (Figure 2 and Supplementary 

276 figure S7). The tissue distribution of these transcripts, together with the phylogenetic position 

277 of lizard and venomous snake sequences does not support a monophyletic clade of venom 

278 gland- specific Kunitz-type genes in the Toxicofera. The presence of protease inhibitors in 

279 reptile venom and salivary glands should perhaps not be too surprising and it again seems 

280 likely that the involvement of Kunitz-type inhibitors in venom toxicity in some advanced 

281 snake lineages (in this case mamba (Dendroaspis sp.) dendrotoxins and krait (Bungarus 

282 multicinctus) bungarotoxins [81, 82]) has led to confusion when non-toxic orthologs have 

283 been identified in other species. 

284 Lysosomal acid lipase 
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285 We find two transcripts encoding Lysosomal acid lipase genes in the E. coloratus venom 

286 gland transcriptome, one of which (lipa-a) is also expressed in skin and scent gland in this 

287 species and all three tissues in our other study species, lipa-a, despite not being venom gland 

288 specific, is more highly expressed in the venom gland (3,337.33 FPKM) than in the scent 

289 gland (484.49 FPKM) and skin (22.79 FPKM) of E. coloratus, although there is no evidence 

290 of elevated expression in the salivary glands of our other study species. As this protein is 

291 involved in lysosomal lipid hydrolysis [83] and the venom gland is a highly active tissue, we 

292 suggest that this elevated expression is likely related to high cell turnover. Transcripts of lipa- 

293 b are found at a low level in the venom and scent glands of E. coloratus and the scent gland 

294 of royal python (Figure 2, Supplementary figure S8). Neither lipa-a or lipa-b therefore 

295 encode venom toxins. 

296 Natriuretic peptide 

297 We find only a single natriuretic peptide-like sequence in our dataset, in the skin of the royal 

298 python. The absence of this gene from the rest of our study species suggests that it is not a 

299 highly conserved Toxicoferan toxin. 

300 Nerve growth factor 

301 We find identical transcripts encoding nerve growth factor (ngf) in all three E. coloratus 

302 tissues. Transcripts encoding the orthologous gene are also found in the corn snake salivary 

303 gland and scent gland; rough green snake scent gland and skin; royal python skin and leopard 

304 gecko salivary gland, scent gland and skin (Figure 2 and Supplementary figure S9). ngf is 

305 expressed at a higher level in the venom gland (525.82 FPKM) than in the scent gland (0.18 

306 FPKM) and skin (0.58 FPKM) of E. coloratus, but not at an elevated level in the salivary 

307 gland of other species, again hinting at the potential for exaptation of this gene. Based on 

308 these findings, together with the expression of this gene in garter snake pooled tissues, we 

309 suggest that ngf does not encode a Toxicoferan toxin. However, we do find evidence for the 

31 0 duplication of ngf in cobras (Supplementary figure S9) suggesting that it may represent a 
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31 1 venom toxin in at least some advanced snakes [84]. As with complement c3, it seems likely 

31 2 that the identification of non-toxic orthologs in distantly-related species has led to the 

313 conclusion that ngfis a widely-distributed venom toxin and confused its true evolutionary 

31 4 history. 

31 5 Phospholipase A ? (PLA ? Group HE) 

31 6 We find transcripts encoding Group IIE PLA2 genes in the venom gland of E. coloratus and 

31 7 the salivary glands of all other species (Figure 2, Supplementary figure S10). Although this 

31 8 gene appears to be venom and salivary-gland- specific (based on available data), its presence 

31 9 in all species (including the non-Toxicoferan leopard gecko) suggests that it does not 

320 represent a toxic venom component. 

321 Phospholipase B 

322 We find a single transcript encoding phospholipase b expressed in all three E. coloratus 

323 tissues (Figures 2 and 5) and transcripts encoding the orthologous gene are found in all other 

324 tissues from all study species with the exception of rough green snake salivary gland. We also 

325 find plb transcripts in corn snake vomeronasal organ, garter snake liver, Burmese python 

326 pooled tissues (liver and heart) and bearded dragon brain (Figure 5). The two transcripts in 

327 each of rough green snake and corn snake are likely alleles or the result of individual 

328 variation, and actually represent a single phospholipase b gene from each of these species. 

329 Transcript abundance analysis shows this gene to be expressed at a low level in all tissues 

330 from all study species. Based on the phylogenetic and tissue distribution of this gene it is 

331 unlikely to represent a Toxicoferan venom toxin. 

332 Renin ("renin aspartate protease") 

333 We find a number of transcripts encoding renin-like genes in the E. coloratus venom gland 

334 (Figures 2 and 6), one of which (encoding the canonical renin) is also expressed in the scent 

335 gland and is orthologous to a previously described sequence from the venom gland of the 

336 ocellated carpet viper (Echis ocellatus, accession number CAJ55260). We also find that the 
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337 recently-published Boa constrictor renin aspartate protease {rap) gene (accession number 

338 JX467165 [21]) is in fact a cathepsin d gene, transcripts of which are found in all three 

339 tissues in all five of our study species. We suggest that this misidentification may be due to a 

340 reliance on BLAST-based classification, most likely using a database restricted to squamate 

341 or serpent sequences. It is highly unlikely that either renin or cathepsin d (or indeed any 

342 renin-like aspartate proteases) constitute venom toxins in E. coloratus or E. ocellatus, nor do 

343 they represent basal Toxicoferan toxins. 

344 Ribonuclease 

345 Ribonucleases have been suggested to have a role in the generation of free purines in snake 

346 venoms [58] and the presence of these genes in the salivary glands of two species of lizard 

347 (Gerrhonotus infernalis and Celestus warreni) and two colubrid snakes {Liophis peocilogyrus 

348 and Psammophis mossambicus) has been used to support the Toxicofera [78, 85]. We did not 

349 identify orthologous ribonuclease genes in any of our salivary or venom gland data, nor do 

350 we find them in venom gland transcriptomes from the Eastern diamondback rattlesnake, king 

351 cobra and eastern coral snake (although we have identified a wide variety of other 

352 ribonuclease genes). The absence of these genes in seven Toxicoferans, coupled with the fact 

353 that they were initially described from only 2 out of 1 1 species of snake [85] and 3 out of 18 

354 species of lizard [78] would cast doubt on their status as conserved Toxicoferan toxins. 

355 Three finger toxins (3ftx) 

356 We find 2 transcripts encoding three finger toxin (3ftx)-like genes expressed in the E. 

357 coloratus venom gland, one of which is expressed in all 3 tissues (3ftx-a) whilst the other is 

358 expressed in the venom and scent glands (3ftx-b). Orthologous transcripts of 3ftx-a are found 

359 to be expressed in all three tissues of corn snake, rough green snake salivary gland and skin, 

360 and royal python salivary gland. An ortholog of 3ftx-b is expressed in rough green snake 

361 scent gland. We also find a number of different putative 3ftx genes in our other study species, 

362 often expressed in multiple tissues (Figure 2, Supplementary figure SI 1). Based on the 
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363 phylogenetic and tissue distribution of both of these genes we suggest that they do not 

364 represent venom toxins in E. coloratus. As with other proposed Toxicoferan genes such as 

365 complement c3 and nerve growth factor, it seems likely that 3ftx genes are indeed venom 

366 components in some species, especially cobras and other elapids [31, 69], and that the 

367 identification of their non- venom orthologs in other species has led to much confusion 

368 regarding the phylogenetic distribution of these toxic variants. 

369 Vespryn 

370 We do not find vespryn transcripts in any E. coloratus tissues, although this gene is present in 

371 the genome of this species (accession number KF1 14032). We do however find transcripts 

372 encoding this gene in the salivary and scent glands of the corn snake, and skin and scent 

373 glands of the rough green snake, royal python and leopard gecko (Figure 2, Supplementary 

374 figure S12). We suggest that the tissue distribution of this gene in these species casts doubt 

375 on its role as a venom component in the Toxicofera. 

376 Waprin 

377 We find a number of "waprin"-like genes in our dataset, expressed in a diverse array of body 

378 tissues. Our phylogenetic analyses (Supplementary figure S13) show that previously 

379 characterised "waprin" genes [8, 66, 68, 86, 87] most likely represent WAP four-disulfide 

380 core domain 2 (wfdc2) genes which have undergone a squamate- specific expansion and that 

381 there is no evidence for a venom gland-specific paralog. It is unlikely therefore that these 

382 genes represent a Toxicoferan venom toxin. Indeed, the inland taipan {Oxyuranus 

383 microlepidotus) "Omwaprin" has been shown to be ". . .non-toxic to Swiss albino mice at 

384 doses of up to 10 mg/kg when administered intraperitoneally" [68] and is more likely to have 

385 an antimicrobial function in the venom or salivary gland. 

386 Implications for venom composition and complexity in Echis coloratus 
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387 The following genes show either a venom gland-specific expression or an elevated expression 

388 level in this tissue, but not both and as such we suggest that whilst they may represent venom 

389 toxins in E. coloratus, further analysis is needed in order to confirm this. 

390 Vascular endothelial growth factor 

391 We find four transcripts encoding vascular endothelial growth factor (VEGF) expressed in 

392 the venom gland of E. coloratus. These correspond to vegf-a, vegf-b, vegf-c and vegf-f and of 

393 these vegf-a, b and c are also expressed in the skin and scent gland of this species (Figure 2). 

394 Transcripts encoding orthologs of these genes are expressed in all three tissues of all other 

395 species used in this study (with the exception of the absence of vegf-a in corn snake skin). In 

396 accordance with previous studies [7] we find evidence of alternative splicing of vegf-a 

397 transcripts in all species although no variant appears to be tissue-specific. It is likely that a 

398 failure to properly recognise and classify alternatively spliced vegf-a transcripts (Aird et al. 

399 2013) may have contributed to an overestimation of snake venom complexity, vegf-d was 

400 only found to be expressed in royal python salivary gland and scent gland and all three tissues 

401 from leopard gecko (Figure 2, Supplementary figure S14). The transcript encoding VEGF-F 

402 is found only in the venom gland of E. coloratus and, given the absence of any Elapid vegf-f 

403 sequences in public databases as well as absence of this transcript in the two species of 

404 colubrid in our study, we suggest that vegf-f is specific to vipers. Whilst vegf-f has a higher 

405 transcript abundance in E. coloratus venom gland (186.73 FPKM) than vegf-a (3.24 FPKM), 

406 vegf-b (1.28 FPKM) and vegf-c (1.54 FPKM), compared to other venom genes in this species 

407 (see next section) it has a considerably lower transcript abundance suggesting it represents at 

408 most a minor venom component in E. coloratus. 

409 L-amino acid oxidase 

41 0 We find transcripts encoding two l-amino acid oxidase (laao) genes in E. coloratus, one of 

41 1 which (laao-b) has two splice variants (Figure 2, Supplementary figure SI 5). laao-a 

412 transcripts are found in all three E. coloratus and leopard gecko tissues, laao-b is venom 
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413 gland- specific in E. coloratus (based on the available data) and transcripts of the orthologous 

41 4 gene are found in the scent glands of corn snake, rough green snake and royal python. The 

41 5 splice variant laao-b2 may represent a venom toxin in E. coloratus based on its specific 

416 expression in the venom gland of this species and elevated expression level (628.84 FPKM). 

417 Crotamine 

41 8 We find a single crotamine-like transcript in E. coloratus, in the venom gland (Figure 2). 

41 9 Related genes are found in a variety of tissues in other study species (including the scent 

420 gland of the rough green snake, the salivary gland and skin of the leopard gecko, and in all 

421 three corn snake tissues), although the short length of these sequences precludes a definitive 

422 statement of orthology. This gene may represent a toxic venom component in E. coloratus 

423 based on its tissue distribution, but due to its low transcript abundance (10.95 FPKM) it is 

424 likely to play a minor role, if any. 
425 

426 The following genes are found only in the venom gland of E. coloratus and clearly show an 

427 elevated expression level (Figure 7). Whilst we classify these genes as encoding venom 

428 toxins in this species (Table 1) it should be noted that none of these genes support the 

429 monophyly of Toxicoferan venom toxins. 

430 Cysteine-rich secretory proteins (CRISPs) 

431 We find transcripts encoding two distinct CRISPs expressed in the E. coloratus venom gland, 

432 one of which is also found in skin and scent gland (Figure 2). Phylo genetic analysis of these 

433 genes (which we call crisp-a and crisp-b) reveals that they appear to have been created as a 

434 result of a gene duplication event earlier in the evolution of advanced snakes (Supplementary 

435 Figure S16). crisp-a transcripts are also found in all three corn snake tissues, as well as rough 

436 green snake skin and scent gland and royal python scent gland, crisp-b is also found in corn 

437 snake salivary gland (Figure 2 and Supplementary figure SI 6) and the phylogenetic and 

438 tissue distribution of this gene suggest that it does indeed represent a venom toxin, produced 
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439 via duplication of an ancestral crisp gene that was expressed in multiple tissues, including the 

440 salivary gland. The elevated transcript abundance of crisp-b (3,520.07 FPKM) in the venom 

441 gland of E. coloratus further supports its role as a venom toxin in this species (Figure 7). The 

442 phylogenetic and tissue distribution and low transcript abundance of crisp-a (0.61 FPKM in 

443 E. coloratus venom gland) shows that it is unlikely to be a venom toxin. We also find no 

444 evidence of a monophyletic clade of reptile venom toxins and therefore suggest that, contrary 

445 to earlier reports [20, 78], the CRISP genes of varanid and helodermatid lizards do not 

446 represent shared Toxicoferan venom toxins and, if they are indeed toxic venom components, 

447 have been recruited independently from those of the advanced snakes. Regardless of their 

448 status as venom toxins, it appears likely that the diversity of CRISP genes in varanid lizards 

449 in particular [17] has been overestimated as a result of the use of negligible levels of 

450 sequence variation to classify transcripts as representing distinct gene products 

451 (Supplementary figures S23 and S24). 

452 C-type lectins 

453 We find transcripts encoding 1 1 distinct C-type lectin genes in the E. coloratus venom gland, 

454 one of which (ctl-a) is also expressed in the scent gland of this species. The remaining 10 

455 genes (ctl-b to k) are found only in the venom gland and form a clade with other viper C-type 

456 lectin genes (Figure 2, Supplementary figure S17). Of these, 6 are highly expressed in the 

457 venom gland (ctl-b to d, ctl-f to g and ctl-j) with a transcript abundance range of 3,706.21- 

458 24,122.41 FPKM (Figure 7). The remainder of these genes (ctl-e, ctl-h to i and ctl-k) show 

459 lower transcript abundance (0.80-1,475.88 FPKM), with two (ctl-i and k) being more lowly 

460 expressed than ctl-a (230.06 FPKM). A number of different C-type lectin genes are found in 

461 our other study species, often expressed in multiple tissues (Supplementary figure S17). We 

462 suggest therefore that the 6 venom-gland specific C-type lectin genes which are highly 

463 expressed are indeed venom toxins in E. coloratus and that these genes diversified via the 

464 duplication of an ancestral gene with a wide expression pattern, including in salivary/venom 
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465 glands. Based on their selective expression in the venom gland (from available data) the 

466 remaining four C-type lectin genes cannot be ruled out as putative toxins, although their 

467 lower transcript abundance suggests that they are likely to be minor components in E. 

468 coloratus venom. It should also be noted that a recent analysis of king cobra (Ophiohagus 

469 hannah) venom gland transcriptome and proteome suggested that ". . .lectins do not contribute 

470 to king cobra envenoming" [31]. 

471 Phospholipase A 2 (PLA ^ Group IIA) 

472 We find five transcripts encoding Group IIA PLA2 genes in E. coloratus, three of which are 

473 found only in the venom gland and two of which are found only in the scent gland (these 

474 latter two likely represent intra-individual variation in the same transcript) (Figure 2, 

475 Supplementary figure SI 8). The venom gland- specific transcript PLA2 IIA-c is highly 

476 expressed (22,520.41 FPKM) and likely represents a venom toxin, and may also be a putative 

477 splice variant although further analysis is needed to confirm this. PLA2 IIA-d and IIA-e show 

478 an elevated, but lower, expression level (1,677.15 FPKM and 434.67 FPKM respectively, 

479 Figure 7). Based on tissue and phylogenetic distribution we would propose that these three 

480 genes may represent putative venom toxins (Table 1). 

481 Serine proteases 

482 We find 6 transcripts encoding Serine proteases in E. coloratus (Figure 2, Supplementary 

483 figure S19) which (based on available data) are all venom gland specific. Four of these 

484 transcripts are highly expressed in the venom gland {serine proteases a-c and e; 3,076.01- 

485 7,687.03 FPKM) whilst two are expressed at a lower level {serine proteases d and/; 1,098.45 

486 FPKM and 102.34 FPKM respectively, Figure 7). Based on these results we suggest serine 

487 proteases a, b, c and e represent venom toxins whilst serine proteases d and /may represent 

488 putative venom toxins (Table 1). 

489 Snake venom metalloproteinases 
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490 We find 21 transcripts encoding snake venom metalloproteinases in E. coloratus and of these 

491 14 are venom gland-specific, whilst another (svmp-n) is expressed in the venom gland and 

492 scent gland. Five remaining genes are expressed in the scent gland only whilst another is 

493 expressed in the skin (Figure 2, Supplementary figure S20). Of the 14 venom gland- specific 

494 SVMPs we find 4 to be highly expressed (5,552.84-15,1 18.41 FPKM, Figure 7). In the 

495 absence of additional data, we classify the 13 venom gland- specific svmp genes as venom 

496 toxins in this species (Table 1). 
497 

498 Discussion 

499 Our transcriptomic analyses have revealed that all 16 of the basal venom toxin genes used to 

500 support the hypothesis of a single, early evolution of venom in reptiles (the Toxicofera 

501 hypothesis [17-21, 78]), as well as a number of other genes that have been proposed to 

502 encode venom toxins in multiple species are in fact expressed in multiple tissues, with no 

503 evidence for consistently higher expression in venom or salivary glands. Additionally, only 

504 two genes in our entire dataset of 74 genes in five species were found to encode possible 

505 venom gland- specific splice variants (l-amino acid oxidase b2 and PLA2 IIA-c). We therefore 

506 suggest that many of the proposed basal Toxicoferan genes most likely represent 

507 housekeeping or maintenance genes and that the identification of these genes as conserved 

508 venom toxins is a side-effect of incomplete tissue sampling. This lack of support for the 

509 Toxicofera hypothesis therefore prompts a return to the previously held view [88] that venom 

51 0 in different lineages of reptiles has evolved independently, once at the base of the advanced 

51 1 snakes, once in the helodermatid (gila monster and beaded lizard) lineage and, possibly, one 

51 2 other time in monitor lizards, although evidence for a venom system in this latter group [20, 

51 3 78, 89] may need to be reinvestigated in light of our findings. The process of reverse 

514 recruitment [70], where a venom gene undergoes additional gene duplication events and is 

51 5 subsequently recruited from the venom gland back into a body tissue (which was proposed on 
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51 6 the basis of the placement of garter snake and Burmese python "physiological" genes within 

51 7 clades of "venom" genes) must also be re-evaluated in light of our findings. 

51 8 Bites by venomous snakes are thought to be responsible for as many as 1,841,000 

519 envenomings and 94,000 deaths annually, predominantly in the developing world [90, 91] 

520 and medical treatment of snakebite is reliant on the production of antivenoms containing 

521 antibodies, typically from sheep or horses, that will bind and neutralise toxic venom proteins 

522 [92]. Since these antivenoms are derived from the injection of crude venom into the host 

523 animal they are not targeted to the most pathogenic venom components and therefore also 

524 include antibodies to weakly- or non-pathogenic proteins, requiring the administration of 

525 large or multiple doses [11], increasing the risks of adverse reactions. A comprehensive 

526 understanding of snake venom composition is therefore vital for the development of the next 

527 generation of antivenoms [2, 11, 93] as it is important that research effort is not spread too 

528 thinly through the inclusion of non-toxic venom gland transcripts. Our results suggest that 

529 erroneous assumptions about the single origination and functional conservation of venom 

530 toxins across the Toxicofera has led to the complexity of snake venom being overestimated 

531 by previous authors, with the venom of the painted saw-scaled viper, Echis coloratus likely 

532 consisting of just 34 genes in 8 gene families (Table 1, based on venom gland- specific 

533 expression and a 'high' expression, as defined by presence in the top 25% of transcripts [94] 

534 in at least two of four venom gland samples), fewer than has been suggested for this and 

535 related species in previous EST or transcriptomic studies [5, 35]. However, it is noteworthy 

536 that the results of our analyses accord well with proteomic analyses of venom composition in 

537 snakes, which range from an almost identical complement of 35 toxins in 8 gene families for 

538 the related ocellated carpet viper, Echis ocellatus [36] to between 24-61 toxins in 6-14 

539 families in a range of other species (Table 2). Far from being a "complex cocktail" [10, 11, 

540 95, 96], snake venom may in fact represent a relatively simple mixture of toxic proteins 
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541 honed by natural selection for rapid prey immobilisation, with limited lineage-specific 

542 expansion in one or a few particular gene families. 

543 In order to avoid continued overestimation of venom complexity, we propose that future 

544 transcriptome-based analyses of venom composition must include quantitative comparisons 

545 of multiple body tissues from multiple individuals and robust phylogenetic analysis that 

546 includes known paralogous members of gene families. We would also encourage the use of 

547 clearly explained, justifiable criteria for classifying highly similar sequences as new paralogs 

548 rather than alleles or the result of PCR or sequencing errors, as it seems likely that some 

549 available sequences from previous studies have been presented as distinct genes on the basis 

550 of extremely minor (or even non-existent) sequence variation (see Supplementary figures 

551 S21-S24 for examples of identical or nearly identical ribonuclease and CRISP sequences and 

552 Supplementary figures S25 and S26 for examples of the same sequence being annotated as 

553 two different genes). As a result, the diversity of "venom" composition in these species may 

554 have been inadvertently inflated. 

555 Additionally, we would encourage the adoption of a standard nomenclature for reptile genes, 

556 as the overly-complicated and confusing nomenclature used currently (Table 3) may also 

557 contribute to the perceived complexity of snake venom. We propose that such a nomenclature 

558 system should be based on the comprehensive standards developed for Anole lizards [97], for 

559 example: 

560 • "Gene symbols for all. . .species should be written in lower case only and in italics, 

561 e.g., gene2." 

562 • "Whenever criteria for orthology have been met. . . the gene symbol should be 

563 comparable to the human gene symbol, e.g., if the human gene symbol is GENE2, 

564 then the gene symbol would be gene2." 
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565 • "Duplication of the ortholog of a mammalian gene will be indicated by an "a" or "b" 

566 suffix, e.g., gene2a and gene2b. If the mammalian gene symbol already contains a 

567 suffix letter, then there would be a second letter added, e.g., gene4aa and gene4abT 

568 It seems likely that the application of our approach to other species (together with proteomic 

569 studies of extracted venom) will lead to a commensurate reduction in claimed venom 

570 diversity, with clear implications for the development of next generation antivenoms: since 

571 most true venom genes are members of a relatively small number of gene families, it is likely 

572 that a similarly small number of antibodies may be able to bind to and neutralise the toxic 

573 venom components, especially with the application of "string of beads" techniques [98] 

574 utilising fusions of short oligopeptide epitopes designed to maximise the cross -reactivity of 

575 the resulting antibodies [2]. 
576 

577 Conclusions 

578 We suggest that identification of the apparently conserved Toxicofera venom toxins in 

579 previous studies is most likely a side effect of incomplete tissue sampling, compounded by 

580 incorrect interpretation of phylogenetic trees and the use of BLAST-based gene identification 

581 methods. It should perhaps not be too surprising that homologous tissues in related species 

582 would show similar gene complements and the restriction of most previous studies to only the 

583 "venom" glands means that monophyletic clades of reptile sequences in phylogenetic trees 

584 have been taken to represent monophyletic clades of venom toxin genes. Whilst it is true that 

585 some of these genes do encode toxic proteins in some species (indeed, this was often the 

586 basis for their initial discovery) the discovery of orthologous genes in other species does not 

587 necessarily demonstrate shared toxicity. In short, toxicity in one does not equal toxicity in all. 
588 

589 Methods 



23 



Downloaded from http://biorxiv.org/ on September 18, 2014 

590 Experimental methods involving animals followed institutional and national guidelines and 

591 were approved by the Bangor Universty Ethical Review Committee. 

592 RNA-Seq 

593 Total RNA was extracted from four venom glands taken from four individual specimens of 

594 adult Saw-scaled vipers (Echis coloratus) at different time points following venom extraction 

595 in order to capture the full diversity of venom genes (16, 24 and 48 hours post-milking). 

596 Additionally, total RNA from two scent glands and two skin samples of this species and the 

597 salivary, scent glands and skin of two adult corn snakes (Pantherophis guttatus), rough green 

598 snakes (Opheodrys aestivus), royal pythons {Python regius) and leopard geckos {Eublepharis 

599 macularius) was also extracted using the RNeasy mini kit (Qiagen) with on-column DNase 

600 digestion. Only a single corn snake skin sample provided RNA of high enough quality for 

601 sequencing. mRNA was prepared for sequencing using the TruSeq RNA sample preparation 

602 kit (Illumina) with a selected fragment size of 200-500bp and sequenced using lOObp paired- 

603 end reads on the Illumina HiSeq2000 or HiSeq2500 platform. 

604 Quality control, assembly and analysis 

605 The quality of all raw sequence data was assessed using FastQC [99] and reads for each 

606 tissue and species were pooled and assembled using Trinity [100] (sequence and assembly 

607 metrics are provided in Supplemental tables S1-S3). Putative venom toxin amino acid 

608 sequences were aligned using ClustalW [101] and maximum likelihood trees constructed 

609 using the Jones-Taylor-Thornton (JTT) model with 500 Bootstrap replicates. Transcript 

61 0 abundance estimation was carried out using RSEM [102] as a downstream analysis of Trinity 

61 1 (version trinityrnaseq_r20 12-04-27). Sets of reads were mapped to species-specific reference 

612 transcriptome assemblies (Supplementary table S4) to allow comparison between tissues on a 

613 per-species basis and all results values shown are in FPKM (Fragments Per Kilobase of exon 

61 4 per Million fragments mapped). Individual and mean FPKM values for each gene per tissue 

61 5 per species are given in Supplementary tables S5-S9. All transcript abundance values given 
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61 6 within the text are based on the average transcript abundance per tissue per species to account 

61 7 for variation between individual samples. 

61 8 Transcriptome reads were deposited in the European Nucleotide Archive (ENA) database 

61 9 under accession #ERP001222 and GenBank under accession numbers XXX and XXX and 

620 genes used to reconstruct phylogenies are deposited in GenBank under accession numbers 

621 XXXX-XXXX 
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1077 Tables 
1078 

1079 Table 1. Predicted venom composition of the painted saw-scaled viper, Echis coloratus 
1080 



Gene family 


Number of genes 


SVMP 


13 


C-type lectin 


8 


Serine protease 


6 


PLA2 


3 


CRISP 


1 


L-amino acid oxidase 


1 


VEGF 


1 


Crotamine 


1 


Total 8 


34 
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1098 Table 2. Predicted numbers of venom toxins and venom toxin families from proteomic 

1 099 studies of snake venom accord well with our transcriptome results. 
1100 
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toxins 


families 


Bitis caudalis [103] 


30 


8 


Bitis gabonica gabonica [104] 


35 


12 


Bitis gabonica rhinoceros [103] 


33 


11 


Bitis nasicornis [103] 


28 


9 


Bothriechis schlegelii [105] 


? 


7 


Cerastes cerastes [106] 


25-30 


6 


Crotalus atrox [107] 


-24 


~9 


Echis ocellatus [36] 


35 


8 


Lachesis muta [108] 


24-26 


8 


Naja kaouthia [109] 


61 


12 


Ophiophagus hannah [31] 


7 


14 


Vipera ammodytes [110] 


38 


9 
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1118 Table 3. Venom gene nomenclature. Lack of a formal set of nomenclatural rules for venom 

1119 toxins has led to an explosion of different gene names and may have contributed to the 

1 1 20 overestimation of reptile venom diversity. 



Gene/gene family 


Alternative name and accession number 


3 Finger toxin (3Ftx) 


Denmotoxin [Q06ZW0] 
Candoxin [AY142323] 


CRISP 


Piscivorin [AA062994] 
Catrin [AA062995] 
Ablomin [AAM45664] 
Tigrin [Q8JGT9] 

Kaouthin [ACH73167, ACH73168] 

Natrin-1 [Q7T1K6] 

CRVP [Q8UW25, Q8UW11] 

Pseudechetoxin [Q8AVA4] 

Pseudechin [Q8AVA3] 

Serotriflin [P0CB15] 

Latisemin [Q8JI38] 

Ophanin [AA062996] 

Opharin [ACN93671] 

Bc-CRP [ACE73577, ACE73578] 


Ficolin 


Veficolin [ADK46899] 
Ryncolin [D8VNS7-9, D8VNT0] 


Serine proteases 


Acubin [CAB46431] 
Gyroxin [B0FXM3] 
Ussurase [AAL48222] 
Serpentokallikrein [AAG27254] 
Salmobin [AAC61838] 
Batroxobin [AAA48553] 
Nikobin [CBW30778] 
Gloshedobin [POC5B4] 
Gussurobin [Q8UVX1] 
Pallabin [CAA04612] 
Pallase [AAC34898] 


Snake venom metalloproteinase 
(SVMP) 


Stejnihagin-B [ABA40759] 
Bothropasin [AAC61986] 
Atrase B [ADG02948] 
Mocarhagin 1 [AAM51550] 
Scutatease-1 [ABQ01138] 
Austrelease-1 [ABQ01134] 


Vascular endothelial growth factor 
(VEGF) 


Barietin [ACN22038] 
Cratrin [ACN22040] 
Apiscin [ACN22039] 
Vammin [ACN22045] 


Vespryn 


Ohanin [AAR07992] 
Thaicobrin [P82885] 


Waprin 


Nawaprin [P60589] 
Porwaprin [B5L5N2] 
Stewaprin [B5G6H3] 
Veswaprin [B5L5P5] 
Notewaprin [B5G6H5] 
Carwaprin [B5L5P0] 
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1121 Figure legends 

1122 

1 1 23 Figure 1. Relationships of key vertebrate lineages and the placement of species 

1 1 24 discussed in this paper. A monophyletic clade of reptiles (which includes birds) is 

1 1 25 shaded green and the Toxicofera [21] are shaded red. Modified taxon names 

1 1 26 have been used for simplicity. 
1127 

1 1 28 Figure 2. Tissue distribution of proposed venom toxin transcripts. The majority of 

1 1 29 transcripts proposed to encode Toxicoferan venom proteins are expressed in multiple body 

1 1 30 tissues. Transcripts found in the assembled transcriptomes but which are assigned transcript 

1 131 abundance of <1 FPKM are shaded orange. Eco, painted saw-scaled viper (Echis coloratus); 

1 132 Pgu, corn snake (Pantherophis guttatus); Oae, rough green snake (Opheodrys aestivus); Pre, 

1 1 33 royal python {Python regius); Ema, leopard gecko {Eublepharis macularius). VG, venom 

1 1 34 gland; SAL, salivary gland; SCG, scent gland; SK, skin. 
1135 

1 1 36 Figure 3. Maximum likelihood tree of complement c3 ("cobra venom factor'''') sequences. 

1 1 37 Whilst most sequences likely represent housekeeping or maintenance genes, a gene 

1 1 38 duplication event in the elapid lineage (marked with *) may have produced a venom-specific 

1 1 39 paralog. An additional duplication (marked with +) may have taken place in Austrelaps 

1 1 40 superbus, although both paralogs appear to be expressed in both liver and venom gland. 

1 1 41 Geographic separation in king cobras (Ophiophagus hannah) from Indonesia and China is 

1 1 42 reflected in observed sequence variation. Numbers above branches are Bootstrap values for 

1 1 43 500 replicates. Tissue distribution of transcripts is indicated using the following 

1 144 abbreviations: VG, venom gland; SK, skin; SCG, scent gland, AG, accessory gland; VMNO, 

1 1 45 vomeronasal organ and those genes found to be expressed in one or more body tissues are 

1 1 46 shaded blue. 
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1 1 48 Figure 4. Maximum likelihood tree of dipeptidylpeptidase 3 (dpp3) and 

1 1 49 dipeptidylpeptidase 4 (dpp4) sequences. Transcripts encoding dpp3 and dpp4 axe found in a 

1 1 50 wide variety of body tissues, and likely represent housekeeping genes. Numbers above 

1 1 51 branches are Bootstrap values for 500 replicates. Tissue distribution of transcripts is indicated 

1 1 52 using the following abbreviations: VG, venom gland; SK, skin; SCG, scent gland, AG, 

1 1 53 accessory gland; VMNO, vomeronasal organ and those genes found to be expressed in one or 

1 1 54 more body tissues are shaded blue. 
1155 

1 1 56 Figure 5. Maximum likelihood tree of phospholipase b (plb) sequences. Transcripts 

1 1 57 encoding plb axe found in a wide variety of body tissues, and likely represent housekeeping 

1 1 58 genes. Numbers above branches are Bootstrap values for 500 replicates. Tissue distribution of 

1 1 59 transcripts is indicated using the following abbreviations: VG, venom gland; SK, skin; SCG, 

1 1 60 scent gland, AG, accessory gland; VMNO, vomeronasal organ and those genes found to be 

1 1 61 expressed in one or more body tissues are shaded blue. 
1162 

1 1 63 Figure 6. Maximum likelihood tree of renin-like sequences. Renin-like genes are 

1 1 64 expressed in a diversity of body tissues. The recently published Boa constrictor "RAP-Boa- 

1 1 65 1" sequence is clearly a cathepsin d gene and is therefore not orthologous to the Echis 

1 1 66 ocellatus renin sequence as has been claimed [21]. Numbers above branches are Bootstrap 

1 1 67 values for 500 replicates. Tissue distribution of transcripts is indicated using the following 

1 1 68 abbreviations: VG, venom gland; SK, skin; SCG, scent gland and those genes found to be 

1 1 69 expressed in one or more body tissues are shaded blue. 
1170 

1 1 71 Figure 7. Graph of transcript abundance values of proposed venom transcripts in the 

1 1 72 Echis coloratus venom gland. The majority of Toxicoferan transcripts are expressed at 
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1 1 73 extremely low level, with the most highly expressed genes falling into only four gene 

1 1 74 families (C-type lectins, Group IIA phospholipase Aa, serine proteases and snake venom 

1 1 75 metalloproteinases). FPKM = Fragments Per Kilobase of exon per Million fragments 

1 1 76 mapped. 
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